feat: Add range partitioning support by kitagry · Pull Request #174 · embulk/embulk-output-bigquery

kitagry · 2025-02-15T01:02:52Z

This pull request introduces support for range partitioning in BigQuery.

I checked this feature with example/config_replace_field_range_partitioned_table.yml

hiroyuki-sato

Thank you for creating this PR. I'll review this later. Before review this PR, I have a question. Could you check my comment?

hiroyuki-sato · 2025-02-15T15:31:40Z

+  range_partitioning:
+    field: customer_id
+    range:
+      start: '1'


Why does this part use string instead of integer? As far as I know, range partition uses a number. And we can check the range is valid (start < end) if we use integer.

What do you think of this configuration layout?
(Do we need a range block?).

range_partitioning: field: customer_id # document uses `column` but this plugin uses `field`. [1] start: 1 end: 1000 interval: 10 # [1] https://cloud.google.com/bigquery/docs/creating-partitioned-tables#create_an_integer-range_partitioned_table

(This is just my opinion, I want to ask co-maintainers this comment.)

Thank you for the comment! I followed the api documentation. If you prefer integer, I'll change this!

https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#RangePartitioning

Do we need a range block?

I fixed it with 49d2c36.

Hello, @kitagry. Thank you for waiting.

Could you use this layout? (Sorry, we decided to use the original design (except using integer instead of string in range values.)

range_partitioning: field: customer_id range: start: 1 # integer not string. end: 99999 # integer not string. interval: 1 # integer not string.

and Could you check the range start + interval < end?
If you have any concern, please let me know.

I referenced the time_partition configurations.

BigQuery API use

{ "type": string, "expirationMs": string, "field": string, "requirePartitionFilter": boolean }

embulk configuration

type: bigquery table: table_name$20160929 time_partitioning: type: DAY expiration_ms: 259200000 # integer not strong, use sake case

We discussed this using this design document. (Written in Japanese)

After modification, I'll check the partition feature.

hiroyuki-sato · 2025-02-17T04:53:19Z

@kitagry Thank you for more work on this PR. I'll review this PR please wait. I'm not the original plugin developer. So, I need to investigate the configuration rule. If this plugin is based on the API settings, It would be better to use the original field.range.start. I'll talk co-maintainer about this.

kitagry · 2025-03-06T12:56:08Z

Hi @hiroyuki-sato , I changed range-partitioning fileds to be integer in f8d039f

hiroyuki-sato · 2025-03-11T00:15:33Z

@kitagry Thanks. I will test this PR later. Please wait for a while.

hiroyuki-sato · 2025-03-23T21:23:30Z

@kitagry Thank you for waiting. I'm so sorry, but I can't get the review time. I'll review this in April.

kitagry · 2025-04-22T07:25:26Z

@hiroyuki-sato Hi. Could you review this?

hiroyuki-sato

Hello, @kitagry

Sorry for the late review. I was able to confirm that the main part of this PR works. I was able to create a partition table using this dummy data.
I added the id field using the following command.
I have some comments. Could you take a look?

nkf -w input.csv | awk -F, '{ print NR-1","$0 }' > curry.csv

in:
  type: file
  path_prefix: ./curry.csv
  parser:
    charset: UTF-8
    newline: LF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: long}
    - {name: name, type: string}
    - {name: kana, type: string}
    - {name: address, type: string}
    - {name: sex, type: string}
    - {name: age, type: long}
    - {name: birthday, type: timestamp, format: '%Y/%m/%d'}
    - {name: marriage, type: string}
    - {name: prefecture, type: string}
    - {name: mobile, type: string}
    - {name: carrier, type: string}
    - {name: curry, type: string}
out:
  type: bigquery
  mode: append
  project: project
  dataset: dataset
  auth_method: service_account
  json_keyfile: key.json
  auto_create_table: true
  table: part_test
  location: asia-northeast1
  range_partitioning:
    field: id
    range:
      start: 1
      end: 5001
      interval: 500

kitagry · 2025-04-24T00:29:46Z

Thank you for the review. I fixed.

hiroyuki-sato

@kitagry Thank you for your work. I left two comments. Could you tell me your opinion?

hiroyuki-sato · 2025-04-24T08:55:56Z

          end
-        elsif Helper.has_partition_decorator?(task['table'])
+        # If user specify range_partitioning, it should be used as is
+        elsif Helper.has_partition_decorator?(task['table']) && task['range_partitioning'].nil?


Could you elaborate on this part? IIUC Helper.has_partition_decorator?(task['table']) returns true if the table name contains $. why you add task['range_partitioning'].nil??

I can't understand If user specify range_partitioning, it should be used as is means.

In my opinion, these checks are preferable.

# can't use two partitions config at the same time. if task['time_partitioning'] && task['range_partitioning'] raise ConfigError.new ... end # partition decrator doesn't support range_partition (if needed) if Helper.has_partition_decorator?(task['table']) && task['range_partitioning'] raise ConfigError.new ... end

Thank you for the review. I think that the following code would be preferable.

if Helper.has_partition_decorator?(task['table']) task['time_partitioning'] = {'type' => 'DAY'} end if task['time_partitioning'] && task['range_partitioning'] raise ConfigError.new ... end

Hmm, the error message would be confusing. I fix like e9cc945. Could you check this?

(This is just a double check.) the error message means If user specify range_partitioning, it should be used as is?

When I use like following. Users who use partition decorator may be confused, because he don't use time_partitioning.

if Helper.has_partition_decorator?(task['table']) task['time_partitioning'] = {'type' => 'DAY'} end if task['time_partitioning'] && task['range_partitioning'] raise ConfigError.new "`time_partitioning` and `range_partitioning` cannot be used at the same time" end

Oh, you mean if the user uses the following configuration, the user may confuse the storage message. correct?
If so, your proposed PR seems good.

table: part_test$200105 range_partitioning: field: id range: start: 1 end: 5001 interval: 500

hiroyuki-sato

@kitagry Thank you for fixing the PR. Basically, LGTM. Could you check two minor comments?

hiroyuki-sato · 2025-04-28T05:36:03Z

@kitagry Thank you for your work. LGTM. I'll re-check this change and request co-maintainer review.

hiroyuki-sato

@kitagry, could you fix the test and check my comments? I'll approve this PR after fixing the test. (I'll request a check co-maintainer)

kitagry · 2025-04-30T04:45:31Z

Thank you for the comment. I'll see tomorrow 🙏

Co-authored-by: Hiroyuki Sato <hiroysato@gmail.com>

hiroyuki-sato

@kitagry LGTM👍

@joker1007 Could you take a look this PR?

hiroyuki-sato · 2025-05-13T08:56:27Z

@joker1007 Thanks!

feat: Add range partitioning support

adf5210

hiroyuki-sato self-requested a review February 15, 2025 15:47

hiroyuki-sato reviewed Feb 15, 2025

View reviewed changes

kitagry force-pushed the add-range-paritioned branch from 49d2c36 to f8d039f Compare March 6, 2025 12:54

hiroyuki-sato reviewed Mar 7, 2025

View reviewed changes

Comment thread lib/embulk/output/bigquery.rb Outdated

fix: change range-partitioning fields to be int

5398f69

kitagry force-pushed the add-range-paritioned branch from f8d039f to 5398f69 Compare March 9, 2025 06:26

hiroyuki-sato reviewed Apr 22, 2025

View reviewed changes

Comment thread lib/embulk/output/bigquery/bigquery_client.rb Outdated

Comment thread README.md Outdated

hiroyuki-sato reviewed Apr 24, 2025

View reviewed changes

fix: time_partitioning and range_partition config error

e9cc945

kitagry force-pushed the add-range-paritioned branch from 1e0f108 to e9cc945 Compare April 26, 2025 05:00

hiroyuki-sato reviewed Apr 28, 2025

View reviewed changes

Comment thread lib/embulk/output/bigquery.rb Outdated

Comment thread lib/embulk/output/bigquery.rb Outdated

hiroyuki-sato reviewed Apr 29, 2025

View reviewed changes

Comment thread test/test_configure.rb Outdated

Comment thread lib/embulk/output/bigquery.rb Outdated

fix: apply suggestions from code review

068acba

Co-authored-by: Hiroyuki Sato <hiroysato@gmail.com>

kitagry force-pushed the add-range-paritioned branch from de9c8be to 068acba Compare May 1, 2025 14:21

hiroyuki-sato approved these changes May 2, 2025

View reviewed changes

hiroyuki-sato requested a review from joker1007 May 2, 2025 00:25

joker1007 approved these changes May 13, 2025

View reviewed changes

hiroyuki-sato merged commit 9793f84 into embulk:master May 13, 2025
2 checks passed

hiroyuki-sato mentioned this pull request May 13, 2025

time partition #171

Closed

t3t5u mentioned this pull request Aug 5, 2025

Fixed issue where time_partitioning settings were overridden trocco-io/embulk-output-bigquery#37

Merged

Conversation

kitagry commented Feb 15, 2025

Uh oh!

hiroyuki-sato left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hiroyuki-sato commented Feb 17, 2025

Uh oh!

kitagry commented Mar 6, 2025

Uh oh!

Uh oh!

hiroyuki-sato commented Mar 11, 2025

Uh oh!

hiroyuki-sato commented Mar 23, 2025

Uh oh!

kitagry commented Apr 22, 2025

Uh oh!

hiroyuki-sato left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kitagry commented Apr 24, 2025

Uh oh!

hiroyuki-sato left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hiroyuki-sato left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hiroyuki-sato commented Apr 28, 2025

Uh oh!

hiroyuki-sato left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kitagry commented Apr 30, 2025

Uh oh!

hiroyuki-sato left a comment

Choose a reason for hiding this comment

Uh oh!

hiroyuki-sato commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

hiroyuki-sato left a comment •

edited

Loading